Improving Word Sense Disambiguation Using Topic Features
نویسندگان
چکیده
This paper presents a novel approach for exploiting the global context for the task of word sense disambiguation (WSD). This is done by using topic features constructed using the latent dirichlet allocation (LDA) algorithm on unlabeled data. The features are incorporated into a modified naı̈ve Bayes network alongside other features such as part-of-speech of neighboring words, single words in the surrounding context, local collocations, and syntactic patterns. In both the English all-words task and the English lexical sample task, the method achieved significant improvement over the simple naı̈ve Bayes classifier and higher accuracy than the best official scores on Senseval-3 for both task.
منابع مشابه
رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA
Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...
متن کاملUse of Combined Topic Models in Unsupervised Domain Adaptation for Word Sense Disambiguation
Topic models can be used in an unsupervised domain adaptation for Word Sense Disambiguation (WSD). In the domain adaptation task, three types of topic models are available: (1) a topic model constructed from the source domain corpus: (2) a topic model constructed from the target domain corpus, and (3) a topic model constructed from both domains. Basically, three topic features made from each to...
متن کاملKSU KDD: Word Sense Induction by Clustering in Topic Space
We describe our language-independent unsupervised word sense induction system. This system only uses topic features to cluster different word senses in their global context topic space. Using unlabeled data, this system trains a latent Dirichlet allocation (LDA) topic model then uses it to infer the topics distribution of the test instances. By clustering these topics distributions in their top...
متن کاملWord Sense Disambiguation using case based Approach with Minimal Features Set
In this paper we presented a case based approach for word sense disambiguation using minimal features set. To make the disambiguation, we took only two features for two different methods, post-bigram (immediate left word with ambiguous word – l1w) and pre-bigram (ambiguous word with immediate right word of it – wr1). To classify the cases for disambiguation, we followed three steps: instance or...
متن کاملMSS: Investigating the Effectiveness of Domain Combinations and Topic Features for Word Sense Disambiguation
We participated in the SemEval-2010 Japanese Word Sense Disambiguation (WSD) task (Task 16) and focused on the following: (1) investigating domain differences, (2) incorporating topic features, and (3) predicting new unknown senses. We experimented with Support Vector Machines (SVM) and Maximum Entropy (MEM) classifiers. We achieved 80.1% accuracy in our experiments.
متن کامل